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Abstract. In this paper we apply computer learning methods to diag- 
nosing ovarian cancer using the level of the standard biomarker CA125 
in conjunction with information provided by mass-spectrometry. We are 
working with a new data set collected over a period of 7 years. Using 
the level of CA125 and mass-spectrometry peaks, our algorithm gives 
probability predictions for the disease. To estimate classification accu- 
racy we convert probability predictions into strict predictions. Our al- 
gorithm makes fewer errors than almost any linear combination of the 
CA125 level and one peak's intensity (taken on the log scale). To check 
the power of our algorithm we use it to test the hypothesis that CA125 
and the peaks do not contain useful information for the prediction of 
the disease at a particular time before the diagnosis. Our algorithm pro- 
duces p- values that are better than those produced by the algorithm that 
has been previously applied to this data set. Our conclusion is that the 
proposed algorithm is more reliable for prediction on new data. 

Key words: Online prediction, aggregating algorithm, ovarian cancer, 
mass-spectrometry, proteomics 



1 Introduction 

Early detection of ovarian cancer is important since clinical symptoms sometimes 
do not appear until the late stage of the disease. This leads to difficulties in 
treatment of the patient. Using the antigen CA125 significantly improves the 
quality of diagnosis. However, CAI25 becomes less reliable at early stages and 
sometimes elevates too late to make use of it. Our goal is to investigate whether 
existing methods of online prediction can improve the quality of the detection of 
the disease and to demonstrate that the information contained in mass spectra 
is useful for ovarian cancer diagnosis in the early stages of the disease. We refer 
to the combination of CA125 and peak intensity meaning the decision rule in 
the form 

u{v, w,p) = 1) In C + w In/p, 

where C is the level of CA125, Ip is the intensity of the p-th peak, and v, w are 
taken from the sets described below. 

We consider prediction in triplets: each case sample is accompanied by two 
samples from healthy individuals, matched controls, which are chosen to be as 



close as possible to the case sample with respect to attributes such as age, storage 
conditions, and serum processing. In the given triplet of samples of different 
individuals we detect one sample which we predict as cancer. This framework 
was first described in [5]. The authors analyze an ovarian cancer data set and 
show that the information contained in mass-spectrometry peaks can help to 
provide more precise and reliable predictions of the diseased patient than the 
CA125 criteria by itself some months before the moment of the diagnosis. In this 
paper we use the same framework and set of decision rules (CA125 combined 
with peak intensity) to derive an algorithm which performs better in some sense 
than any of these rules. 

For our research we use a different more recent ovarian cancer data set [9] 
processed by the authors of [3] with a larger number of items than in [5]. We 
combine decision rules proposed in [3] by using an online prediction algorithm^ 
and thus get our own decision rule. In this paper wo use a combining algorithm 
described in [13], because it allows us to output a probability measure on a 
given triplet and has the best theoretical guarantees for this type of prediction. 
In order to estimate classification accuracy, we convert probability predictions 
into strict predictions by the maximum rule: we assign weight 1 to the labels 
with maximum predicted probability, weight to the labels of other samples, 
and then normalize the assigned weights. 

We show that our algorithm gives more reliable predictions than the vast 
majority of particular combinations (in fact, more thorough experiments, not 
described here, show that it outperforms all particular combinations). It per- 
forms well on different stages of disease. And when testing the hypothesis that 
CA125 and peaks do not contain useful information for the prediction of the 
disease at its early stages, our algorithm gives better p-values in comparison to 
the algorithm which chooses the best combination; in addition, our algorithm 
requires fewer adjustments. 

Our paper is organized as follows. In Section 2 we describe methods we use 
to give predictions. Section 3 gives a short description of the data set on which 
we work. We show our experiments and results in Section 4, separated into de- 
scription of the probability prediction algorithm in Subsection 4.1 and detection 
at different stages before diagnosis in Subsection 4.2. Section 5 concludes our 
paper. 

2 Online prediction framework and Aggregating 
Algorithm 

The mathematical framework used in this paper is called prediction with expert 
advice. In this framework different experts predict a sequence of events step by 
step. The ones that make errors suffer loss defined by a chosen loss function. 
The goal of an online prediction algorithm is to combine the experts' predic- 
tions in such a way that at each step the algorithm's cumulative loss is close to 



^ A survey of online prediction can be found in [2] . 



the cumulative loss of the best expert. Unlike statistical learning theory, online 
prediction does not impose any restrictions on the data generating process. 

A game of prediction consists of three components: the space of outcomes 
Q, the space of predictions F, and the loss function A : x r' — > R, which 
measures the quality of predictions. In our experiments we are interested in the 
Brier game [1] , since it is widely used in probability forecasting. 

Let i? be a finite and non-empty set, F := 'P{Q) be the set of all probability 
measures on Q. The Brier loss function is defined by 

A(u;,7) = E(^W-'^-W)'- (1) 

Here 7 G -T and 5^^ G 'Pi^) is the probability measure concentrated at lo: 
S^{u!} = 1 and 6i^{o) = for o 7^ w. For example, if i? = {1,2.3}, uj = 1, 
7{1} = 1/2, 7{2} = 1/4, and 7{3} = 1/4, then A(u;,7) = (1/2 - l)^ + (1/4 - 
0)2 + (1/4 -0)2 = 3/8. 

The game of prediction is being played repeatedly by a learner that has access 
to decisions made by a pool of experts, which leads to the following prediction 
protocol: 



Protocol 1 Prediction with expert advice 
Lo := 0. 

L^:=0,k = l,...,K. 
for Ar= 1,2, ... do 

Expert k announces 7^ £ F, k = 1, . . . , K. 

Learner announces 'jn & F. 

Reality announces lon £ fi- 

Ln '■= Ln-1 + A(a;jv, 7jv). 

L% ■- L%_i + X{uiN, i%),k = l,...,K. 
end for 



Here Ljv is the cumulative loss of the learner at a time step N, and is 
the cumulative loss of fcth expert at this step. There are a lot of well-developed 
algorithms for the learner, probably the most known are Weighted Average Algo- 
rithm [8], Strong Aggregating Algorithm [11, 12], Weak Aggregating Algorithm 
[7] , Hedge Algorithm [4] , and Tracking the Best Expert [6] . The basic idea behind 
these algorithms is to assign weights to experts and then use their predictions in 
the correspondence with their weights in a way that minimizes the learner's loss. 
Weights of experts are changed at each step, which allows a prediction algorithm 
to adapt to the sequence of outcomes. 

The Strong Aggregating Algorithm, further called the Aggregating Algo- 
rithm or the AA, has the strongest theoretical guarantees for some games with a 
"sufficiently convex" loss function, whereas the accuracy in practice some cases 
can probably not be the best one. We use the Aggregating Algorithm for the 
experiments described in this paper, but one can use other online algorithms 



to give probability forecasts. In the case of the Brier game with more than two 
outcomes only the AA and the Weighted Average Algorithm have theoretical 
bounds for their losses derived in the extended arXiv version of [13]. The Ag- 
gregating Algorithm has a parameter 77, the learning rate. It is proved that for 
the Brier game the best theoretical guarantees can be received if r/ = 1. The 
theoretical bound for its cumulative loss at a prediction step A'" is 

Ln{AK) <L% + \nK (2) 

for any expert k, where the number of experts equals K. The way it makes 
predictions is described as Algorithm 1. 



Algorithm 1 Strong aggregating algorithm for the Brier game 

w§ := 1, = 1,. . . ,7^. 
for iV = 1, 2, . . . do 

Read the Experts' predictions 

Set Gn(ui) := lnEf=i ™JV-ie"''^'"'^"\ oj€f2. 
Solve E„efi(s " Gjv(tj))+ = 2 in s e K. 
Set 7Ar{w} — (s - G]v(a;)) + /2, w e Q. 
Output prediction 7iv € 'P{0). 
Read observation ojn- 

end for 



3 Data set 

We are working with a data set [3] that was collected over the period of 7 years 
and has patients with the disease (referred to as cases) and patients who were 
healthy all this period, called controls. Description of the collection process is 
not a goal of this paper, so we do not state this question in detail. More detailed 
description of the data set and peak extracting procedures can be found in [9] 
and [3]. This paper develops further the analysis performed in [3]. 

We consider prediction in triplets. There arc 881 samples in total: 295 cases, 
586 matched controls. There are up to 5 samples for each of the cases. Informa- 
tion for all samples contains the value of CA125, time to diagnosis, intensities of 
67 mass- spectrometry peaks, and other. Time to diagnosis is the time interval 
measured in months between the date when the measurement was taken and the 
date when OC was diagnosed, or the date of operation. Peaks are ordered by their 
frequency, or the percentage of samples having a non-aligned peak. We have 67 
peaks of frequency more than 33%. For classification purposes we exclude cases 
with only one matched control, and cases with lack of suitable information. As a 
result, we have 179 triplets containing 358 control samples and 179 case samples 
taken from 104 individuals. Each triplet is assigned a time-to- diagnosis defined 
from the time to the moment of diagnosis of the case sample in this triplet. 



4 Experiments 



This section describes two experiments. The first is a study of probabihty pre- 
diction of ovarian cancer. The second checks that our results are not accidental 
by calculating }>values. 

4.1 Probability prediction of ovcirian cancer 

The aim of this experiment is to demonstrate how wc give probability predictions 
for samples in a triplet and compare them to predictions using CA125 only. The 
outcome of each event can be represented as a vector (1, 0, 0), (0, 1, 0), or (0, 0, 1). 
The prediction of CA125 is represented as a vector (al,o2,o3). This vector is 
received by applying the maximum rule to CA125 levels. 

We use the following procedure to construct other predictors combining 
CA125 and peak intensities. For each patient we calculate values 

u{v,'w,p) = vlnC + wlnlp, (3) 

where C is the level of CA125, Ip is the intensity of the p-th peak, p= 1, . . . , 67, 

V e {0,1}, w; e {-2,-1,-1/2,0,1/2,1,2}. The total number of different com- 
binations, or experts, is 537: 402 = 6 x 67 for w = 1, w ^ 0, 134 = 2 x 67 for 

V = 0, and 1 for ?; = 1, w; = 0. The authors of [3] show how such combinations 
can predict cancer well up to 15 months before diagnosis. 

For online prediction purposes we sort all the triplets by the date of measure- 
ment of the case sample. At each step we give the probability of being diseased 
for each person in the triplet, or numbers pi,P2,P3 > : pi + P2 + P3 = 1- 
We choose the uniform initial distribution on the experts and the theoretically 
optimal value for the parameter rj, r] = 1, of the Aggregating Algorithm. The 
evolution of the cumulative Brier loss of all the experts minus the cumulative 
loss of our algorithm over all the 179 triplets is presented in Figure 1. Clearly, 
the line for the AA is zero since we subtract its loss from itself. Experts having 
the line lower than zero are better than the AA, experts having the line higher 
than zero are worse. The a;-axis presents triplets in the chronological order. We 
can see from Figure 1 that the Aggregating Algorithm predicts better than most 
experts in our class after about 54 triplets, in particular better than CA125. At 
the end the A A is better than all the experts. The group of lines clustered on 
the top of the graph separated from the main group are experts which do not 
include CA125. They make relatively many mistakes especially on late stages 
of the disease and accumulate a large loss. This shows that the probability pre- 
dictions of the AA are more precise than predictions of experts interpreted as 
probability predictions. Moreover, we can be sure that the loss of the Aggregat- 
ing Algorithm will never be much worse than the loss of the best expert since 
there is a theoretical bound for it [13] . 

One can say this comparison is not fair because we allow experts give only 
strict predictions, and our algorithm is more flexible so its Brier loss is not so 
large. On the other hand, it is not trivial to find experts which make probability 




Fig. 1. Cumulative loss of probability 
predictions of the Aggregating Algo- 
rithm and other predictors over all the 
triplets. 



Fig. 2. Cumulative loss of strict predic- 
tions of categorical AA and other predic- 
tors over all the triplets. 



predictions, or convert CA125 to probabilities of the disease for each sample in 
triplet, so this approach presents one of the ways to generate them. 

In order to make a more strict comparison we allow the AA to make only 
strict predictions and use the maximum rule to convert probability predictions 
into strict predictions. We will further refer to this algorithm as to the categorical 
AA. If we calculate the Brier loss, we get Figure 2. We can see that the categorical 
AA still beats CA125 at the end in the case where it gives strict predictions. 
The final performance is the performance on the whole data set. In this case the 
loss of the categorical A A is more than the loss of some predictors. It is useful to 
know specific combinations which perform well in this experiment. At the last 
step the best performance is achieved by combinations 

lnC-ln73,ln(7- iln/a, (4) 
lnC-ln/2,lnC- 
After them combinations with peaks 50, 2, 7, 1, 34, 47 follow. 

4.2 Prediction on different stages of the disease 

Our second experiment is aimed to investigate whether it is possible to predict 
better than CA125 at early stages of the disease. In this experiment we follow 
the approach proposed in [3]. We consider 6-month time intervals with starting 
point t = 0,1,. ..,16 months before diagnosis. We will show further that our 
predictions are not reliable for earlier stages. For each period we select only those 
triplets from the corresponding time interval, the latest for each case patient if 
there are more than one. We denote the number of triplets for the interval t of 
length 6 by St,e- We use ^ = 6. 



In this experiment we do not use a uniform initial weight distribution on the 
experts for the Aggregating Algorithm. Instead, we assume the importance of 
a peak decreases as its number increases in accordance with a power law, and 
that different combinations including the same peak have the same importance. 
This makes sense because peaks are sorted by their frequency in the data set, 
so peaks further down the list are less frequent and important for fewer people. 
Our specific weighting scheme is that the combinations with peak 1 have initial 
weight 1 = dP, the combinations with peak 2 have initial weight d~^, etc. We 
empirically choose the coefficient for this distribution d = 1.2, and the parameter 
T] for the AA rj = 0.65. The number of errors was calculated as a half of Brier 
loss, which corresponds to counting errors in the case where predictions are 
strict. Figure 3 shows the fraction of erroneous predictions made by different 
algorithms over different time periods. It presents values for CA125, for the 
Aggregating Algorithm, and for the best one combination of the; form (3). We 
also include fractions of erroneous predictions for the three best combinations 
(4) as peaks 2 and 3 were noticed in [3] to have a good performance. 
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Fig. 3. Fraction of erroneous predietions 
over different time periods of different 
predictors 



Fig. 4. The logarithm of p-values for dif- 
ferent algorithms 



This figure shows that the performance of the Aggregating Algorithm is at 

least as good as the performance of CA125 on all stages before diagnosis. For the 
period 9-13 months the combination InC — ln/3 performs better than the AA, 
but on late stages 0-8 months it performs worse. Other combinations are even 
worse. Thus we can say that instead of choosing one particular combination, we 
should use the Aggregating Algorithm to mix all the combinations. This allows 
us to predict well on some stages of the disease. 

The choice of the coefficients for the AA requires us to check that our results 
are not accidental. Since the amount of data we have does not allow us to carry 
out reliable cross-validation procedure, we follow the approach to calculating p- 



values proposed in [5] . This approach was applied for combinations (3) in [3] . For 
each stage of the disease, we are testing the null hypothesis that peak intensities 
and CA125 do not carry any information relevant for predicting labels. Except 
for the earliest stages, we prove that either this hypothesis is violated or some 
very unlikely event happened. 

We calculate p- values for testing the null hypothesis. The p- value can be 
defined as the value taken by a function p satisfying 

V(5 Probability(p <5)<5 

for all 5 S (0, 1) under the null hypothesis. To calculate p-values wc choose the 
test statistic T described below, apply it to our data, and get the value Tq. Then 
we calculate the probability of the event that T <Tq under the null hypothesis. 

Let T be a triplet in ^j f; and err(r, d, rj) be half loss of the categorical 
AA with parameter r] and initial power distribution with parameter d on the 
triplet r. Then the half loss in each time interval [t,t + 6] is ErT{Stfi; d;ri) = 
X^reSt 6 where Stfi is the set of triplets for the time interval t [t, t + 6]. 

Let us assume that the AA with parameters d = 1.2 and rj = 0.65 makes Nt 
errors on the triplets from St,Q. We randomly reassign labels in triplets. Then 
for each t we calculate the minimum number of errors E made by the AA by 
the rule 

E= min Err(S't e, c', 'y)- 

Here D = {1.1, 1.2, . . . , 2.0} and R = {0.1, 0.15, 0.2, . . . , 1.0}, so we consider dif- 
ferent values for all parameters of the algorithm. This number is our test statistic. 
The p-value is calculated by the Monte-Carlo procedure stated as Algorithm 2. 



Algorithm 2 p-value calculation 
Input: t, time to diagnosis. 
Input: N = 10'', number of trials. 
Eo := mindgc,,,eKErr(St,6,d,7?) 
Q :=0 

for J = 1, . . . , iV do 

Assign a case label to a randomly chosen sample in each triplet in Stfi. 
Calculate E = mindgu,^gij Err(5t,6, d, r?) for this data set. 
if S < So then 

Q = Q + 1 
end if 
end for 

Output: as a p-value. 



The logarithms of p-values for different algorithms are presented in Fig- 
ure 4. It includes values for AA. It also includes values taken from [3] for the 



CA125 only. It includes values for the algorithm described in [3]. This algo- 
rithm chooses the combination with the best performance and the most frequent 
peak for each permutation of labels. The figure also includes the p- values for the 
algorithm, which chooses the best combination with one particular peak, 2 or 3. 

As we can see, our algorithm has small p-values, comparable with or even 
smaller than p-valucs for other algorithms. But our algorithm has fewer adjust- 
ments, because it does not choose even the peak at each step, but mixes all peaks 
in the same manner. It does not even choose the best parameters for every time 
interval but chooses them for all the time periods. The precise values for errors 
and j3- values are presented in Table 1. Lower index e means the half loss for a 
given algorithm, lower index p means the p-values for a given algorithm. The 
Mine column shows the minimum number of errors made by one of the combi- 
nations, the p column shows the p- values for the method which chooses the best 
combination for a current time period (see [3]), Cf ^ shows the number of errors 
for the combination InC — ln/3, Cfg shows the number of errors for the combi- 
nation InC — ^ ln/3,Cg shows number of errors for the combination InC — ln/2. 
Columns 3j, and 2p contain the ])-values for peaks 3 and 2 correspondingly. 



Table 1. Number of errors and p- values for different algorithms 
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68 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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0.0001 


1 


56 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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47 
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0.0001 
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0.0001 


3 


0.0001 
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0.0001 
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0.0001 
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36 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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27 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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0.0001 
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23 
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0.0008 
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0.0006 


4 


0.0006 
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0.0007 
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0.0004 
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20 
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0.0010 


5 


0.0004 
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0.0028 
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0.0046 
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0.0010 
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17 
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0.0071 
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0.0006 
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0.0141 
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0.0098 
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0.0017 
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17 
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0.0021 
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0.0003 
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0.0019 
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0.0020 
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0.0020 
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20 
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0.0042 
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0.0009 


5 


0.0076 
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0.0009 
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0.0010 


10 


28 


14 


0.0503 
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0.0001 
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0.0003 
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0.0001 
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0.0001 


11 


28 


15 


0.1028 
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0.0006 
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0.0042 
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0.0004 


11 


0.0008 


12 


28 


17 


0.3164 


11 


0.0120 


10 


0.0585 


10 


11 


0.0049 


13 


0.0033 


13 


30 


16 


0.0895 


10 


0.0011 


10 


0.0168 


10 


11 


0.0015 


13 


0.0007 


14 


25 


16 


0.4661 


10 


0.0070 
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0.0304 


10 


11 


0.0301 


11 


0.0015 


15 


20 


13 


0.5211 
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0.0124 
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0.0464 
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0.0577 
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0.0022 


16 


10 


6 


0.4406 


6 


0.6708 
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0.4101 


6 


6 


0.5979 


6 


0.5165 



In practice, one often chooses a suitable significance level for their particular 
task. If we choose it at 5%, then we can see from the table that CA125 classi- 
fication is significant up to 9 months in advance of diagnosis (the p-values are 
less than 5%). At the same time, the results for peaks combinations and for AA 
are significant for up to 15 months. 



5 Conclusion 



Our results show that the CA125 criterion, which is a current standard for the 
detection of ovarian cancer, can be outperformed, especially at early stages. We 
have proposed a way to give probability predictions for the disease and sliowcid 
that predicting this way we suffer less loss than other predictors based on the 
combination of CA125 and peak intensities. We made another experiment to 
investigate the performance of our algorithm at different stages before diagnosis. 
We found that the Aggregating Algorithm we use to mix combinations predicts 
better than almost any combination. To check that our results are not accidental 
we calculate p-values from it under the null hypothesis that peaks and CA125 
do not give any information about the disease at a particular time before the 
diagnosis. Using our test statistic we get small p-values. They show this hypoth- 
esis can be rejected at the standard significance level 5% later than 16 months 
before diagnosis. Our test statistic produces p-values that are never worse than 
the p- values produced by the statistic proposed in [3] . There is no other papers 
dealing with our database. Other approaches of probability prediction of ovarian 
cancer using CA125 criteria based on the Risk of Ovarian Cancer algorithm (see 
[10]) require multiple statistical assumptions about the data and a much larger 
size of a database. Thus they can not be comparable in our setting. 

An interesting direction of future research is to consider the prediction of the 
probability of the disease for an individual patient, rather than put it artificially 
into triplets. 
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